Learning Against Non-Stationary Agents with Opponent Modelling & Deep Reinforcement Learning

نویسنده

Richard Everett

چکیده

Humans, like all animals, both cooperate and compete with each other. Through these interactions we learn to observe, act, and manipulate to maximise our utility function, and continue doing so as others learn with us. This is a decentralised non-stationary learning problem, where to survive and flourish an agent must adapt to the gradual changes of other agents as they learn, as well as capitalise on sudden shifts in their behaviour. To learn in the presence of such non-stationarity, we introduce the Switching Agent Model (SAM) that combines traditional deep reinforcement learning – which typically performs poorly in such settings – with opponent modelling, using uncertainty estimations to robustly switch between multiple policies. We empirically show the success of our approach in a multi-agent continuous-action environment, demonstrating SAM’s ability to identify, track, and adapt to both gradual and sudden changes in the behaviour of non-stationary agents.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Learning with Opponent-Learning Awareness

Multi-agent settings are quickly gathering importance in machine learning. Beyond a plethora of recent work on deep multi-agent reinforcement learning, hierarchical reinforcement learning, generative adversarial networks and decentralized optimization can all be seen as instances of this setting. However, the presence of multiple learning agents in these settings renders the training problem no...

متن کامل

Combining Opponent Modeling and Model-Based Reinforcement Learning in a Two-Player Competitive Game

When an opponent with a stationary and stochastic policy is encountered in a twoplayer competitive game, model-free Reinforcement Learning (RL) techniques such as Q-learning and Sarsa(λ) can be used to learn near-optimal counter strategies given enough time. When an agent has learned such counter strategies against multiple diverse opponents, it is not trivial to decide which one to use when a ...

متن کامل

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

Reinforcement learning (RL) is based on the idea that the tendency to produce an action should be strengthened (reinforced) if it produces favorable results, and weakened if it produces unfavorable results. Q-learning is a recent RL algorithm that does not need a model of its environment and can be used on-line. Therefore, it is well suited for use in repeated games against an unknown opponent....

متن کامل

Opponent Modeling against Non-stationary Strategies: (Doctoral Consortium)

Most state of the art learning algorithms do not fare well with agents (computer or humans) that change their behaviour in time. This is the case because they usually do not model the other agents’ behaviour and instead make some assumptions that for real scenarios are too restrictive. Furthermore, considering that many applications demand different types of agents to work together this should ...

متن کامل

On the usefulness of opponent modeling: the Kuhn Poker case study

The application of reinforcement learning algorithms to Partially Observable Stochastic Games (POSG) is challenging since each agent does not have access to the whole state information and, in case of concurrent learners, the environment has non-stationary dynamics. These problems could be partially overcome if the policies followed by the other agents were known, and, for this reason, many app...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2017

Learning Against Non-Stationary Agents with Opponent Modelling & Deep Reinforcement Learning

نویسنده

چکیده

منابع مشابه

Learning with Opponent-Learning Awareness

Combining Opponent Modeling and Model-Based Reinforcement Learning in a Two-Player Competitive Game

Multiagent reinforcement learning in the Iterated Prisoner's Dilemma.

Opponent Modeling against Non-stationary Strategies: (Doctoral Consortium)

On the usefulness of opponent modeling: the Kuhn Poker case study

عنوان ژورنال:

اشتراک گذاری